Markov Model Variants for Appraisal of Coding Potential in Plant DNA
نویسندگان
چکیده
Markov chain models are commonly used for content-based appraisal of coding potential in genomic DNA. The ability of these models to distinguish coding from non-coding sequences depends on the method of parameter estimation, the validity of the estimated parameters for the species of interest, and the extent to which oligomer usage characterizes coding potential. We assessed performances of Markov chain models in two model plant species, Arabidopsis and rice, comparing canonical fixed-order, χ-interpolated, and top-down and bottom-up deleted interpolated Markov models. All methods achieved comparable identification accuracies, with differences usually within statistical error. Because classification performance is related to G+C composition, we also considered a strategy where training and test data are first partitioned by G+C content. All methods demonstrated considerable gains in accuracy under this approach, especially in rice. The methods studied were implemented in the C programming language and organized into a library, IMMpractical, distributed under the GNU LGPL.
منابع مشابه
Appraisal of the entire mitochondrial genome for DNA barcoding in birds
DNA barcoding based on a standardized region of 648 base pairs of mitochondrial DNAsequences from Cytochrome C Oxidase 1 (COX1) is proposed for animal species identification.Recent studies suggested that DNA barcoding has been effective for identifying 94% of birdspecies. The proposed threshold of 10 times the average intraspecific variation could be used forthe identification and delimitation ...
متن کاملEvaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes
Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded DNA virus. There were two approaches for prediction of each Markov Model parameter,...
متن کاملA Markov Model for Performance Evaluation of Coal Handling Unit of a Thermal Power Plant
The present paper discusses the development of a Markov model for performance evaluation of coal handling unit of a thermal power plant using probabilistic approach. Coal handling unit ensures proper supply of coal for sound functioning of thermal Power Plant. In present paper, the coal handling unit consists of two subsystems with two possible states i.e. working and failed. Failure and repair...
متن کاملComprehensive Computational Analysis of Protein Phenotype Changes Due to Plausible Deleterious Variants of Human SPTLC1 Gene
Genetic variations found in the coding and non-coding regions of a gene are known to influence the structure as well as the function of proteins. Serine palmitoyltransferase long chain subunit 1 a member of α-oxoamine synthase family is encoded by SPTLC1 gene which is a subunit of enzyme serine palmitoyltransferase (SPT). Mutations in SPTLC1 have been associated with hereditary sensory and auto...
متن کاملComputational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کامل